ComfyUI で circlestone-labs の Anima を使う

Anima は軽量で NSFW の絡みが出せるのが強みだ。

テキストエンコーダーが 0.6B しかない（現在の軽量モデルは 4B を使うのが主流）ので細かい指示はできない。たとえば、コマの位置の指定ができなかったり、自然言語でポーズを指定できなかったり、タグの伝染が起こったりする。

Anima はダンボールタグにあるポーズしか出せないし、ダンボールタグにあるオブジェクトしか描けない。Z-Image や FLUX.2 klein は自然言語でポーズを指示でき、多様なオブジェクトを知っているが、Anima はテキストエンコーダーもモデルも貧弱かつ、データセットが偏っているので汎用性がない。

しかしテキストエンコーダーの能力が低くても、個数指定や左右指定ぐらいはできる。例えば thigh strap の個数や左右位置を指定できたり、左右非対称の衣装を固定したりできる。

なので Anima だけですべての作業をするのではなく、ダンボールタグ外の要素は FLUX.2 klein や Qwen Image Edit のような編集モデルを使う割り切りが必要。

アーティストタグを使うなら以下のワークフローはすでに実用レベル：

Anima で下絵
アップスケール
Illustrious の派生モデルで i2i （ディティールの追加・シャープネスアップ・画風の固定）
SAM2 や SAM3 でセグメンテーション（なくてもいい）
Illustrious の派生モデルで Detailer

欠点

文字が描けない
背景の品質がよくない（Z-Image Turbo や FLUX.2 klein と比較して）
タグの伝染が起こるので複数キャラを出す場合は、プロンプトが長くなる
漫画のコマの位置の指定などはできない
ダンボールタグに無いものは描けない
自然言語でダンボールタグにないポーズを指示できない
手が溶ける（現在はプレビュー版で高解像度でファインチューンすればましになる）
高解像度画像の生成ができない（現在はプレビュー版で高解像度でファインチューンすればましになる）

モデル

VAE は Qwen Image (Edit) と同じ。

配置場所	URL
models/unet	anima-preview.safetensors
models/text_encoders	qwen_3_06b_base.safetensors
models/vae	qwen_image_vae.safetensors

ワークフロー

example.png を ComfyUI にドラッグ。

設定

Generation settings

解像度 1MP
30-50 steps
4-5 CFG
サンプラー

er_sde: ニュートラルスタイル、フラットカラー、シャープなライン。デフォルトの推奨サンプラー
euler_a: ソフトで線が細い。2.5D に向いている。CFG を高くしても彩度が高くなりづらい
dpmpp_2m_sde_gpu: er_sde に似ているが絵のばらつきが大きい。

解像度

以下の解像度で生成可能（プロンプトに highres を入れ、サンプラーは euler_ancestral）。Anima を Detailer として使う場合は 1024 x 1024 の解像度を使うのが無難。

1536 x 1536
1856 x 1024

プロンプト

画力を上げる

アーティストタグを使う。アーティスト名の前に半角の @ をつける（@アーティスト名）。ネガティブプロンプトに画力の低いアーティストを指定すると、とても安定する。厚塗り系が欲しいなら、アニメ塗りのアーティストをネガに入れる。

アーティスト名はポジティブのみ

ポジティブとネガティブ両方に異なるアーティスト名

タグの記述順

[quality/meta/year/safety tags] [1girl/1boy/1other etc] [character] [series] [artist] [general tags]

クオリティタグはマスピ系列と Pony V7 系列と両方機能する。Pony V7 はアンダースコアが必要なことに注意。

カテゴリ	タグ例
クオリティタグ	masterpiece, best quality, good quality, normal quality, low quality, worst quality PonyV7 aesthetic model based: score_9, score_8, ..., score_1
年代タグ	year 2025, year 2024, ... newest, recent, mid, early, old
メタタグ	highres, absurdres, anime screenshot, jpeg artifacts, official art, etc
セーフティータグ	safe, sensitive, nsfw, explicit
アーティストタグ	@アーティスト名

例

year 2025, newest, normal quality, score_5, highres, safe, 1girl, oomuro sakurako, yuru yuri, @nnn yryr, smile, brown hair, hat, solo, fur-trimmed gloves, open mouth, long hair, gift box, fang, skirt, red gloves, blunt bangs, gloves, one eye closed, shirt, brown eyes, santa costume, red hat, skin fang, twitter username, white background, holding bag, fur trim, simple background, brown skirt, bag, gift bag, looking at viewer, santa hat, ;d, red shirt, box, gift, fur-trimmed headwear, holding, red capelet, holding box, capelet

自然言語

Natural language prompting tips

全て自然言語で記述する場合は、２文以上必要。短すぎるとまともな画像にならない。なおクオリティタグは必須。

自然言語とタグを混ぜられる。

キャラ名の後にキャラの属性を記述する。キャラを複数指定する場合は、必ずこのルールは守る必要がある。キャラ名だけを列挙するのもよくない。キャラの特徴が混ざる。キャラを複数指定する場合は各キャラの特徴も記述する必要がある。

"Digital artwork of Fern from Sousou no Frieren, with long purple hair and purple eyes, wearing a black coat over a white dress with puffy sleeves..."

タグの伝染

複数人いるうちの一人のみ服装を指定すると、他の人もその服を着る。これは表情やポーズでも同じ。何らかの指定（服・表情・ポーズ・位置）をすると、その属性は全員分指定しないとタグが伝染して意図しない結果になる。

closed eyes はその対義語の open eyes がないので問題になる。brown eyes のように目の色を指定すると目を開けやすくなる。

cutout も伝染しやすい。navel cutout と breast cutout は片方指定するともう片方が頻出する。２人いて片方 navel cutout、もう片方 breast cutout の指示はほぼ守られない。

強調構文

comfy/text_encoders/anima.py の AnimaTokenizer.tokenize_with_weight() で Qwen 3 0.6b ウェイトを 1.0 にリセットする処理をしているが、t5xxl のトークナイザーは機能しているので (タグ:2) のような強調構文も機能する。

Anima は Qwen3 0.6b と t5xxl と、２つのトークナイザでプロンプトをトークン化^*1した後、両方を Qwen 3 0.6b でエンコードして２つの embeddings を作成^*2する。それらを結合して DiT に入力する^*3。

*1: comfy/text_encoders/anima.py, tokenize_with_weights

*2: comfy/text_encoders/anima.py, encode_token_weights

*3: comfy/samplers.py, calc_cond_batch, cond_cat

枠・帯

黒い枠が出る場合は、highres をプロンプトに入れる。border, pillarboxed, letterboxed をネガに入れても効果は薄い。

tips

大きさの指定は効いたり効かなかったりする。見えるものを書くのが基本。

悪い例：small navel cutout や large navel cutout
良い例：navel cutout, Her abdomen is mostly exposed.

推論速度

環境

Windows11 25H2
RTX3050
RAM 32 GB
python 3.12.9
torch 2.9.1+cu128
triton_windows-3.5.1.post23
sageattention-2.2.0+cu128torch2.9.0.post4
CFG５

SageAttention なし

解像度	推論速度 (s/it)	30 step (秒)
1024 x 1024	2.5	80
1024 x 1408	3.7	110

SageAttension あり

解像度	推論速度 (s/it)	30 step (秒)
1024 x 1024	2.32	70
1024 x 1408	3.30	100

作例

設定は：

30 steps
cfg５
sampler: er_sde
scheduler: simple

masterpiece, best quality, @アーティストタグ.

There are three girls in a room.

The girl on the left has short red hair and blue eyes. She is sitting on a stool. She is wearing a pink camisole and gray dolphin shorts.

The girl in the middle has long silver hair and red eyes. She is standing. She is wearing a white collared shirt and a black pencil skirt.

The girl on the right has medium brown hair and green eyes. She is sitting on a stool. She is wearing a beige sweater and a blue denim.

There is a potted plant, a frying pan on the kitchen wall in the background.


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia

@アーティストタグ, masterpiece, best quality

# play

uncensored, vaginal, sex from behind, rough sex, from side

# girl

on the left side, skinny, black long hair, eyelashes, red eyes, open mouth, medium breasts, choker, blue cotton panties, panties aside, pigeon-toed, barefoot

# boy

on the right side, grabbing another's ass

# effect

cum in pussy, sound effects, motion lines, sweat

# background 

indoors, carpet, table, door, painting (object)


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia

@アーティストタグ, masterpiece, best quality, indoors.

There are four balls on a table. 

A left top ball is red. A blue right top ball is twice the size of the red ball. A yellow left bottom ball is twice the size of the blue ball. A green right bottom ball is twice the size of the yellow ball.


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia, sketch, english text, engrish text, chinese text, korean text

色・位置・個数には忠実だが、大きさは何度やっても指示通りにならない。

@アーティストタグ, masterpiece, best quality, 2girls, outdoors.

A girl with brown hair wearing a school uniform is standing.

Another girl with red hair wearing a t-shirt and a skirt is standing behind the girl far away.

奥行の指定はできる。

Anima

@アーティストタグ, masterpiece, best quality, 1girl, indoors, carpet.

A girl with long hair wearing a school swimsuit is standing on the floor. 

# arms

Her left arm is straight up, and her right arm is straight out to the side. 

# legs

She is stepping on the sofa seat with her right foot and her left foot is on the floor. 


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia, sketch, english text, engrish text, chinese text, korean text

何度やってもプロンプトのポーズができない。

FLUX.2 klein 9b

Anime style.

A girl with long hair wearing a competition swimsuit is standing on the floor. 

# arms

Her left arm is straight up, and her right arm is straight out to the side. 

# legs

She is stepping on the sofa seat with her right foot and her left foot is on the floor. 

# room
There is a carpet.

数十回やり直した。キャラタグ３人以上は描き分けの成功率がとても低い。

@アーティストタグ, masterpiece, best quality, 3girls, 

hatsune miku is wearing a grey shirt with bare shoulders. She is lying on back.

kagamine rin with short blonde hair wearing a sailor shirt. girl on top. She is straddling on hatsune miku.

megurine luka with pink long hair with crossed arms. She is standing behind them. She is looking at the girls.

# background 

indoors, window, curtain, 


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia

@アーティストタグ, masterpiece, best quality,

kasane teto (sv), red drill hair, grey jacket, layered skirt


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia

プレビュー版だが 1girl の精度は高いので、事前学習を相当量こなしている可能性が高い。

sensitive, newest, year 2025, anime screenshot, 1girl, outdoors, school uniform, from below


ネガティブプロンプト

@厚塗り系アーティスト, worst quality, low quality, score_1, score_2, score_3, score_4, blurry, jpeg artifacts, sepia, sketch, english text, engrish text, chinese text, korean text

クオリティタグ（masterpiece や score_9 など）があるとイラストっぽくなるので、アニメにしたい場合はクオリティタグを外す。

@アーティスト名, 1girl, chibi, portrait, white background

frieren, capelet, floating earrings, gold trim, green eyes, pointy ears, sleeve cuffs, smile, striped shirt, finger to mouth, sideways glance, looking at viewer, half-closed eyes


ネガティブプロンプト

backlighting, worst quality, low quality, score_1, score_2, score_3, score_4, blurry, jpeg artifacts, sepia, sketch, english text, engrish text, chinese text, korean text

@アーティストタグ, masterpiece, best quality, 2boys, 1girl, 

The three of them are lined up in a row.

# girl
The standing girl is wearing a school uniform, ahegao, double v.

# boy 1
A standing boy wearing a t-shirt and pants is expressionless and looking looking outside(left) on the left.

# boy 2
Another boy wearing a school swimsuit is looking at the girl with surprising, open mouth. He is standing on the right. crossed arms.

# background

indoors, classroom, chalkboard


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia, sketch

マークダウンの見出しの記号の # が漏れているので、マークダウンは理解できない可能性がある。

@アーティストタグ, masterpiece, best quality, 3girls, multiple views, brown long hair, indoors, classroom, chalkboard,

# Left
The standing girl is wearing a school uniform, v, smile.

# Middle
The girl is wearing a bra and a panties, embarrassed.

# Right
The angry girl wearing a school swimsuit. pointing at viewer. 


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia, sketch, english text, engrish text, chinese text, korean text

@アーティストタグ, masterpiece, best quality, 2koma, comic, 1girl, 1boy, indoors

# koma 1

1girl, solo, :D, seductive smile, brown eyes, long eyelashes, looking at viewer, portrait, collared shirt, suit jacket, id card, straight-on, head tilt

# koma 2

from side, nude, sex from behind, holding another's wrist, doggy style, torogao, perky breasts, ass, closed eyes


ネガティブプロンプト

worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia, sketch, english text, engrish text, chinese text, korean text

３コマ縦もしくは横３連は指示できるが、１コマ大ゴマ＋下左右分割のようなレイアウトはプロンプトだけでは指示できない。

そのほかの情報

軽量モデルのテキストエンコーダー

	Z-Image (Turbo)	FLUX.2 Klein 4b	Newbie (Lumina-Image 系列)	Netayume (Lumina-Image 2.0)	Anima (Cosmos-Predict2-2B)
テキストエンコーダー	Qwen 3	Qwen 3	Gemma 3	Gemma 2	Qwen 3
テキストエンコーダーのパラメータ数	4b	4b	4b	2b	0.6b

circlestone-labs/Animaのライセンスによると、Anima は NVIDIA の Cosmos-Predict2-2B-Text2Image の派生モデル。

テキストエンコーダーは Qwen3 0.6B、VAE は Qwen-Image VAE。Cosmos-Predict2-2B-Text2Image はテキストエンコーダーは T5XXL、VAE は Wan2.1 VAE。

nvidia/Cosmos-Predict2-2B-Text2Image

Cosmos-Predict2 World Simulation Model for Physical AI

おそらく、ロボットの強化学習用に使われることを想定したモデル。学習データセットは車載・工場・厨房の動画データが多い。

テキストエンコーダーは T5 XXL のエンコーダーのみで、テキストエンコーダーのパラメータ数はおよそ 4.7B。SD3 も同じテキストエンコーダーを使っている。

VAE は Wan-AI/Wan2.1-T2V-1.3B-Diffusers。

プロンプトの推奨語数は 300 語以下で、ベース解像度は 1280 x 704。720 は 64 で割り切れないので、端数を切り捨てて 11 x 64 = 704。

Limitations

高解像度の画像を生成させようとするとアーティファクトがでる。